Outlier detection for acoustic model training using robust statistics

نویسندگان

  • Shigeki Matsuda
  • Wolfgang Herbordt
  • Satoshi Nakamura
چکیده

In this paper, we propose an acoustic model training technique which is robust against outliers such as clipping, unexpected noise, poorly pronounced word segments, or mistranscriptions, which deteriorate the quality of the acoustic models and in turn decrease speech recognition performance. The outlier-robust acoustic model training technique is based on a maximum likelihood (ML) criterion and automatically detects and removes outliers from the training data. Experiments with artificially contaminated mis-transcribed training data show that nearly the same word error rate can be obtained for contaminated data using the proposed technique as for uncontaminated data. Application to a dialogue speech database with unknown outliers reduces the errors by 4.03%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator

The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...

متن کامل

Simulation of Scour Pattern Around Cross-Vane Structures Using Outlier Robust Extreme Learning Machine

In this research, the scour hole depth at the downstream of cross-vane structures with different shapes (i.e., J, I, U, and W) was simulated utilizing a modern artificial intelligence method entitled "Outlier Robust Extreme Learning Machine (ORELM)". The observational data were divided into two groups: training (70%) and test (30%). Then, using the input parameters including the ratio of the st...

متن کامل

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Identification of outliers types in multivariate time series using genetic algorithm

Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...

متن کامل

Detection of Outliers and Influential Observations in Linear Ridge Measurement Error Models with Stochastic Linear Restrictions

The aim of this paper is to propose some diagnostic methods in linear ridge measurement error models with stochastic linear restrictions using the corrected likelihood. Based on the bias-corrected estimation of model parameters, diagnostic measures are developed to identify outlying and influential observations. In addition, we derive the corrected score test statistic for outliers detection ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005